60 research outputs found
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Chemistry and materials science are complex. Recently, there have been great
successes in addressing this complexity using data-driven or computational
techniques. Yet, the necessity of input structured in very specific forms and
the fact that there is an ever-growing number of tools creates usability and
accessibility challenges. Coupled with the reality that much data in these
disciplines is unstructured, the effectiveness of these tools is limited.
Motivated by recent works that indicated that large language models (LLMs)
might help address some of these issues, we organized a hackathon event on the
applications of LLMs in chemistry, materials science, and beyond. This article
chronicles the projects built as part of this hackathon. Participants employed
LLMs for various applications, including predicting properties of molecules and
materials, designing novel interfaces for tools, extracting knowledge from
unstructured data, and developing new educational applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines
Theory and application of medium to high throughput prediction method techniques for asymmetric catalyst design
With the use of computational methods in the field of drug design becoming ever more prevalent, there is pressure to port these technologies to other fields. One of the fields ripe for application of computational drug design techniques; specifically virtual screening and computer-aided molecular design, is the design and synthesis of asymmetric catalysts. Such methods could either guide the selection of the optimal catalyst(s) for a given reaction and a given substrate or provide an enriched selection of highly efficient asymmetric catalysts which enable the synthetic chemists to focus on the most promising candidates. This would in turn provide savings in time and reduce the costs associated with the synthesis and evaluation of large libraries of molecules. However, to be applicable to the evaluation of a large number of potential catalysts, speed is of utmost importance. This impetus has led to the development of medium to high throughput virtual screening (HTVS) methods for asymmetric catalyst development or assessment, although a very few applications have been reported. These methods typically fall into four classes: methods combining quantum mechanics and molecular mechanics (QM/MM), pure molecular mechanics-based methods \u2013 a class which can be subdivided into static and dynamic transition state modeling \u2013 and lastly quantitative structure selectivity relationship methods (QSSR). This review will cover specific methods within these classes and their application to selected reactions.Peer reviewed: YesNRC publication: Ye
Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering
Protein engineers
have long been hard at work to harness biocatalysts
as a natural source of regio-, stereo-, and chemoselectivity in order
to carry out chemistry (reactions and/or substrates) not previously
achieved with these enzymes. The extreme labor demands and exponential
number of mutation combinations have induced computational advances
in this domain. The first step in our virtual approach is to predict
the correct conformations upon mutation of residues (i.e., rebuilding
side chains). For this purpose, we opted for a combination of molecular
mechanics and statistical data. In this work, we have developed automated
computational tools to extract protein structural information and
created conformational libraries for each amino acid dependent on
a variable number of parameters (e.g., resolution, flexibility, secondary
structure). We have also developed the necessary tool to apply the
mutation and optimize the conformation accordingly. For side-chain
conformation prediction, we obtained overall average root-mean-square
deviations (RMSDs) of 0.91 and 1.01 Ã… for the 18 flexible natural
amino acids within two distinct sets of over 3000 and 1500 side-chain
residues, respectively. The commonly used dihedral angle differences
were also evaluated and performed worse than the state of the art.
These two metrics are also compared. Furthermore, we generated a family-specific
library for kinases that produced an average 2% lower RMSD upon side-chain
reconstruction and a residue-specific library that yielded a 17% improvement.
Ultimately, since our protein engineering outlook involves using our
docking software, Fitted/Impacts, we applied our
mutation protocol to a benchmarked data set for self- and cross-docking.
Our side-chain reconstruction does not hinder our docking software,
demonstrating differences in pose prediction accuracy of approximately
2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures.
Similarly, when docking to a set of over 100 kinases, side-chain reconstruction
(using both general and biased conformation libraries) had minimal
detriment to the docking accuracy
Customizable Generation of Synthetically Accessible, Local Chemical Subspaces
Screening
large libraries of chemicals has been an efficient strategy
to discover bioactive compounds; however a portion of the potential
for success is limited to the available libraries. Synergizing combinatorial
and computational chemistries has emerged as a time-efficient strategy
to explore the chemical space more widely. Ideally, streamlining the
evaluation process for larger, feasible chemical libraries would become
commonplace. Thus, combinatorial tools and, for example, docking methods
would be integrated to identify novel bioactive entities. The idea
is simple in nature, but much more complex in practice; combinatorial
chemistry is more than the coupling of chemicals into products: synthetic
feasibility includes chemoselectivity, stereoselectivity, protecting
group chemistry, and chemical availability which must all be considered
for combinatorial library design. In addition, intuitive interfaces
and simple user manipulation is key for optimal use of such tools
by organic chemistsî—¸crucial for the integration of such software
in medicinal chemistry laboratories. We present herein Finders and React2Dî—¸integrated into the Virtual Chemist platform, a modular software suite. This approach
enhances virtual combinatorial chemistry by identifying available
chemicals compatible with a user-defined chemical transformation and
by carrying out the reaction leading to libraries of realistic, synthetically
accessible chemicalsî—¸all with a completely automated, black-box,
and efficient design. We demonstrate its utility by generating ∼40
million synthetically accessible, stereochemically accurate compounds
from a single library of 100 000 purchasable molecules and
56 well-characterized chemical reactions
The Second CACHE Challenge - Targeting the RNA-Binding Pocket of the SARS-CoV2 Nonstructural Protein 13 via a consensus-scoring method and FITTED templated docking.
Disrupting the Nonstructural Protein 13 (NSP13) in SARS-CoV2 could provide a great avenue for the treatment of COVID-19 and help reduce its enormous health burden. As part of the second CACHE challenge, we targeted each of two sub-pockets of the NSP13 RNA-binding site via a multi-pronged virtual screening (VS) campaign, using the latest functionality in FITTED, our docking program, part of the FORECASTER drug discovery suite. After extensive structure preparation and docking (rigid, flexible), we evaluated predicted poses from the VS using four approaches: docking score, machine learning (graph neural network), quantum-mechanics, and visualization, with the final selection being based on the consensus of all four approaches. Additionally, we implemented templated docking within FITTED to take advantage of fragments co-crystallized with NSP13, which supplemented our consensus selection. We now await the experimental testing of our predictions by the Structural Genomics Consortium, and once available, we will update this manuscript accordingly. In sharing our approach and findings, we hope to continue contributing to open science, and engaging in the ongoing effort of the scientific community towards ending COVID-19
Fluoride-Mediated Desulfonylative Intramolecular Cyclization to Fused and Bridged Bicyclic Compounds: A Complex Mechanism
We
previously reported the synthesis of polysubstituted chiral
oxazepanes in three steps from commercially available starting materials.
The unexpected reaction of one of these 1,4-oxazepanes in the presence
of TBAF provided a 4-oxa-1-azabicyclo[4.1.0]Âheptane core. This unusual
process significantly increased the complexity of the molecular scaffold
by introducing a bicyclic core. Surprisingly, the generated bicyclic
structure featuring three stereocenters was a mixture of enantiomers
with no other diastereomers observed. These striking experimental
observations deserved further investigations. A combination of experimental
and computational investigations unveiled a complex diastereoselective
mechanism. Mechanistic rationale is presented for this observed rearrangement
- …